9 research outputs found

    New sublinear methods in the struggle against classical problems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 129-134).We study the time and query complexity of approximation algorithms that access only a minuscule fraction of the input, focusing on two classical sources of problems: combinatorial graph optimization and manipulation of strings. The tools we develop find applications outside of the area of sublinear algorithms. For instance, we obtain a more efficient approximation algorithm for edit distance and distributed algorithms for combinatorial problems on graphs that run in a constant number of communication rounds. Combinatorial Graph Optimization Problems: The graph optimization problems considered by us include vertex cover, maximum matching, and dominating set. A graph algorithm is traditionally called a constant-time algorithm if it runs in time that is a function of only the maximum vertex degree, and in particular, does not depend on the number of vertices in the graph. We show a general local computation framework that allows for transforming many classical greedy approximation algorithms into constant-time approximation algorithms for the optimal solution size. By applying the framework, we obtain the first constant-time algorithm that approximates the maximum matching size up to an additive En, where E is an arbitrary positive constant, and n is the number of vertices in the graph. It is known that a purely additive En approximation is not computable in constant time for vertex cover and dominating set. We show that nevertheless, such an approximation is possible for a wide class of graphs, which includes planar graphs (and other minor-free families of graphs) and graphs of subexponential growth (a common property of networks). This result is obtained via locally computing a good partition of the input graph in our local computation framework. The tools and algorithms developed for these problems find several other applications: " Our methods can be used to construct local distributed approximation algorithms for some combinatorial optimization problems. " Our matching algorithm yields the first constant-time testing algorithm for distinguishing bounded-degree graphs that have a perfect matching from those far from having this property. " We give a simple proof that there is a constant-time algorithm distinguishing bounded-degree graphs that are planar (or in general, have a minor-closed property) from those that are far from planarity (or the given minor-closed property, respectively). Our tester is also much more efficient than the original tester of Benjamini, Schramm, and Shapira (STOC 2008). Edit Distance. We study a new asymmetric query model for edit distance. In this model, the input consists of two strings x and y, and an algorithm can access y in an unrestricted manner (without charge), while being charged for querying every symbol of x. We design an algorithm in the asymmetric query model that makes a small number of queries to distinguish the case when the edit distance between x and y is small from the case when it is large. Our result in the asymmetric query model gives rise to a near-linear time algorithm that approximates the edit distance between two strings to within a polylogarithmic factor. For strings of length n and every fixed E > 0, the algorithm computes a (log n)0(/0) approximation in n1i' time. This is an exponential improvement over the previously known near-linear time approximation factor 20( log (Andoni and Onak, STOC 2009; building on Ostrovsky and Rabani, J. ACM 2007). The algorithm of Andoni and Onak was the first to run in O(n 2 -) time, for any fixed constant 6 > 0, and obtain a subpolynomial, n"(o), approximation factor, despite a sequence of papers. We provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being "repetitive", which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation on the complexity of approximation between edit distance and Ulam distance.by Krzysztof Onak.Ph.D

    Round Compression for Parallel Matching Algorithms

    Get PDF
    For over a decade now we have been witnessing the success of {\em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the {\em maximum matching} problem---one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(logn)O(\log{n}) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. showed that if each machine has n1+Ω(1)n^{1+\Omega(1)} memory, this problem can also be solved 22-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(logn)O(\log{n}) rounds once we enter the near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that perplexing possibility. That is, we break the above O(logn)O(\log n) round complexity bound even in the case of {\em slightly sublinear} memory per machine. In fact, our improvement here is {\em almost exponential}: we are able to deliver a (2+ϵ)(2+\epsilon)-approximation to maximum matching, for any fixed constant ϵ>0\epsilon>0, in O((loglogn)2)O((\log \log n)^2) rounds

    Scalable fair clustering

    No full text
    We study the fair variant of the classic k-median problem introduced by Chierichetti et al. (Chierichetti et al., 2017) in which the points are colored, and the goal is to minimize the same average distance objective as in the standard k-median problem while ensuring that all clusters have an "approximately equal" number of points of each color. Chierichetti et al. proposed a two-phase algorithm for fair k-clustering. In the first step, the pointset is partitioned into subsets called fairlets that satisfy the fairness requirement and approximately preserve the k-median objective. In the second step, fairlets are merged into k clusters by one of the existing k-median algorithms. The running time of this algorithm is dominated by the first step, which takes super-quadratic time. In this paper, we present a practical approximate fairlet decomposition algorithm that runs in nearly linear time

    Walking Randomly, Massively, and Efficiently

    No full text

    Round Compression for Parallel Matching Algorithms

    No full text
    © 2019 Society for Industrial and Applied Mathematics For over a decade now we have been witnessing the success of massive parallel computation frameworks, such as MapReduce, Hadoop, Dryad, or Spark. Compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises however in this context is can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the maximum matching problem. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(log n) rounds. Lattanzi et al. [SPAA, ACM, New York, 2011, pp. 85-94] showed that if each machine has n1+\Omega (1) memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow-up work, seem though to get stuck in a fundamental way at roughly O(log n) rounds once we enter the (at most) near-linear memory regime. In this paper, we break the above O(log n) round complexity bound even in the case of slightly sublinear memory per machine. In fact, our improvement here is almost exponential: we are able to deliver a (1 + \epsilon )-approximate maximum matching for any fixed constant \epsilon > 0 in O((log log n)2) rounds. To establish our result we need to deviate from the previous work in two important ways. First, we use vertex-based graph partitioning, instead of the edge-based approaches that were utilized so far. Second, we develop a technique of round compression
    corecore